Skip to content

SOLR-13764: Introduce IntervalsQParserPlugin#4582

Open
mkhludnev wants to merge 45 commits into
apache:mainfrom
mkhludnev:copilot/add-json-queries-key
Open

SOLR-13764: Introduce IntervalsQParserPlugin#4582
mkhludnev wants to merge 45 commits into
apache:mainfrom
mkhludnev:copilot/add-json-queries-key

Conversation

@mkhludnev

@mkhludnev mkhludnev commented Jul 1, 2026

Copy link
Copy Markdown
Member

https://issues.apache.org/jira/browse/SOLR-13764

This pull request adds support for the Intervals Query Parser in Solr, enabling users to construct Lucene interval queries using a JSON-based DSL. The changes include the core implementation, integration with the standard query parser plugins, documentation updates, and new tests to ensure distributed and JSON query support.

Intervals Query Parser Implementation and Integration:

  • Added the IntervalsQParserPlugin to the set of standard query parser plugins in QParserPlugin.java, allowing interval queries to be parsed and executed.
  • Updated RequestUtil.java to recognize and correctly handle the new json_queries key in JSON requests, ensuring it is passed through for use by query parsers.

The syntax proposed

q={!intervals df=title}$myQuery&json={json_queries:{myQuery:{match:{query:quick}}, q2:{},..}} 

this QP refers to json.json_queries. I propose to introduce it since I expect more usages for pure json parsing.

The considerable alternatives are

?q={!intervals json_query=q1 df=subject}&json={json_queries:{q1:{match:{query:quick}}, q2:{},..}}
?q={!intervals param_ref=q1 df=subject}&json={param:{q1:{match:{query:quick}}, q2:{},..}}

or even param_ref=$q1 like here

Propose yours syntax in comments.

Testing and Validation:

  • Introduced a new test class, DistributedQParserTest.java, which verifies distributed search support for the new intervals query parser alongside existing parsers.
  • Expanded TestJsonRequest.java to test the handling and preservation of the json_queries key in JSON requests.
  • Especially interesting to see a test case proving an advantage of intervals over spans test

Documentation:

  • Added a new changelog entry describing the addition of the Intervals Query Parser.
  • Updated the Solr Reference Guide to document the new query parser and linked its dedicated documentation page in the navigation and parser overview.

Checklist

Please review the following and check all that apply:

  • [v] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [v] I have created a Jira issue and added the issue ID to my pull request title.
  • [v] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • [v] I have developed this patch against the main branch.
  • [v] I have run ./gradlew check.
  • [v] I have added tests for my changes.
  • [v] I have added documentation for the Reference Guide
  • [v] I have added a changelog entry for my change

Copilot AI and others added 30 commits June 30, 2026 21:01
…tend, unordered_no_overlaps, not_within, within, at_least, no_intervals
Updated the section title and removed outdated content regarding query formats.
Updated the description to reference JSON DSL documentation.
…ns-job

fix: merge split QueryRequest constructor calls to satisfy Spotless
@mkhludnev mkhludnev requested a review from Copilot July 1, 2026 12:32
@github-actions github-actions Bot added documentation Improvements or additions to documentation tests cat:search labels Jul 1, 2026
@mkhludnev mkhludnev requested review from dsmiley and epugh July 1, 2026 12:32
@mkhludnev mkhludnev requested a review from gerlowskija July 1, 2026 12:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new built-in intervals query parser to Solr that constructs Lucene IntervalQuery instances from a JSON-based DSL carried in the request body (under json_queries), with accompanying ref-guide documentation and tests (standalone + SolrCloud + JSON request handling).

Changes:

  • Introduces IntervalsQParserPlugin and registers it as a standard query parser plugin.
  • Extends JSON request processing to accept/pass-through a new json_queries top-level key.
  • Adds ref-guide documentation and new tests covering standalone parsing, distributed execution, and JSON preservation.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
solr/solr-ref-guide/modules/query-guide/querying-nav.adoc Adds navigation entry for the new Intervals query parser page.
solr/solr-ref-guide/modules/query-guide/pages/other-parsers.adoc Adds a short overview section for the Intervals query parser.
solr/solr-ref-guide/modules/query-guide/pages/intervals-query-parser.adoc New detailed documentation page for the Intervals query parser DSL.
solr/core/src/test/org/apache/solr/search/TestIntervalsQParserPlugin.java New unit tests for interval rule parsing/execution.
solr/core/src/test/org/apache/solr/search/json/TestJsonRequest.java Adds coverage ensuring json_queries is preserved in parsed JSON.
solr/core/src/test/org/apache/solr/search/DistributedQParserTest.java New SolrCloud test ensuring intervals works in distributed search.
solr/core/src/java/org/apache/solr/search/QParserPlugin.java Registers intervals as a standard plugin.
solr/core/src/java/org/apache/solr/search/IntervalsQParserPlugin.java Core implementation of the JSON-to-Intervals DSL parser.
solr/core/src/java/org/apache/solr/request/json/RequestUtil.java Allows json_queries as a recognized JSON request top-level key.
changelog/unreleased/SOLR-13764-intervals-query-parser.yml Adds an unreleased changelog entry for the feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread solr/core/src/java/org/apache/solr/search/IntervalsQParserPlugin.java Outdated
Comment on lines +170 to +171
Analyzer analyzer = resolveAnalyzer(params, field, "wildcard");
String normalizedPattern = normalizeMultiTerm(field, pattern, analyzer);
Comment on lines +183 to +190
Analyzer analyzer = resolveAnalyzer(params, field, "fuzzy");
String normalizedTerm = normalizeMultiTerm(field, term, analyzer);

String fuzziness = getOptionalString(params, "fuzziness", "fuzzy");
int maxEdits = resolveFuzziness(fuzziness, normalizedTerm);
int prefixLength = getInt(params, "prefix_length", 0, "fuzzy");
boolean transpositions = getBoolean(params, "transpositions", true, "fuzzy");

Comment on lines +236 to +242
Object termsObj = params.get("terms");
Object intervalsObj = params.get("intervals");
if (termsObj == null && intervalsObj == null) {
throw new SolrException(
SolrException.ErrorCode.BAD_REQUEST,
"Rule 'phrase' requires either 'terms' (string array) or 'intervals' (rule array)");
}
Comment on lines +271 to +273
int maxExpansions =
getInt(params, "max_expansions", Intervals.DEFAULT_MAX_EXPANSIONS, "regexp");
IntervalsSource source = Intervals.regexp(new BytesRef(pattern), maxExpansions);
Comment on lines +285 to +289
int maxExpansions =
getInt(params, "max_expansions", Intervals.DEFAULT_MAX_EXPANSIONS, "range");
BytesRef lowerTerm = lowerTermStr == null ? null : new BytesRef(lowerTermStr);
BytesRef upperTerm = upperTermStr == null ? null : new BytesRef(upperTermStr);
return Intervals.range(lowerTerm, upperTerm, includeLower, includeUpper, maxExpansions);

The Intervals Query Parser (`IntervalsQParserPlugin`) builds Lucene interval queries from a xref:json-request-api.adoc[JSON DSL] description.
Interval queries allow you to express positional constraints such as "these terms must appear within N positions of each other" or "this phrase must appear before that one".
See the https://lucene.apache.org/core/10_4_0/queries/org/apache/lucene/queries/intervals/package-summary.html[Lucene Intervals package documentation] for a detailed description of the underlying interval machinery.
|`query`
|Yes
|—
|The text to analyse and match.
Comment on lines +431 to +434
String analyzerName = getOptionalString(params, "analyzer", ruleName);
if (analyzerName == null) {
return req.getSchema().getQueryAnalyzer();
}
Comment on lines +257 to +260
} else if ("json_queries".equals(key)) {
// passed through as a parsed object for use by SearchComponent.prepare() at subordinate
// nodes; not processed here
continue;
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comment thread solr/core/src/java/org/apache/solr/search/IntervalsQParserPlugin.java Outdated
Comment thread solr/core/src/java/org/apache/solr/search/IntervalsQParserPlugin.java Outdated
Comment thread solr/core/src/java/org/apache/solr/search/IntervalsQParserPlugin.java Outdated
Comment thread solr/core/src/java/org/apache/solr/search/IntervalsQParserPlugin.java Outdated
mkhludnev and others added 4 commits July 1, 2026 16:32
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
multiterm analyzer and QueryEqualityTest
@hiSandog

hiSandog commented Jul 2, 2026

Copy link
Copy Markdown

The JSON DSL and distributed-query pass-through are substantial additions. One edge case worth testing explicitly is validation behavior when json_queries contains multiple named queries and a subordinate node receives a request with only the selected json_query local param. The new RequestUtil exemption for json_queries is important, so a distributed test that uses two named interval queries and selects one would guard against accidental request-param flattening or loss during shard forwarding.

@dsmiley dsmiley left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see Solr finally having a native intervals query parser?

Not sure I'm glad to see it require JSON... but I sympathize. It'd be nice if it could support QParser parsed Strings -> Query and then convert Query to Interval. Could come later.

* Distributed search tests for the standard query parsers: {@code lucene}, {@code dismax}, {@code
* edismax}, and {@code intervals}.
*/
public class DistributedQParserTest extends SolrCloudTestCase {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this... shouldn't this be only for intervals?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if the intent is testing distributed... well we have a whole test framework for that which is quite good: BaseDistributedSearchTestCase. Doesn't even require SolrCloud!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afiak SolrCloudTestCase is rather faster and convenient. It's a little bit redundant for local-only QP, but there's a https://github.com/apache/solr/pull/4582#issuecomment-4861727946 regarding transferring new JSON request entries. I've decided to double check.
We cad drop it if you wish.


[source,text]
----
q={!intervals json_query=myQuery df=title}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks abnormal... a normal query parser parses v. If v needs to refer to something else that's a string, it'd just be $paramName but I see here there's an attempt to retain the JSON structure, not using a string. If the paramName itself contains "json", maybe we can enhance Solr to reognize that to pass it as parsed JSON? This would mean the referenced json in question would be apart from json=. I wonder if there's precedent in Solr for this. I confess I don't use the Json query stuff regularly so I'm less familiar.

Just brainstorming a bit here, that's all.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon, just changed it to

q={!intervals df=title}$myQuery
&
json={
  "json_queries": {
    "myQuery": {
      "match": { "query": "apache solr" }
    }
  }
}

Also, we might name parser as jintervals or json_intervals to give a clue where to look at.

*** xref:json-combined-query-dsl.adoc[]
** xref:searching-nested-documents.adoc[]
** xref:block-join-query-parser.adoc[]
** xref:intervals-query-parser.adoc[]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is placed illogically, shoved between two join query parser pages

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

** xref:other-parsers.adoc[]
*** xref:intervals-query-parser.adoc[]

does it make sense ?

public Query parse() {
String jsonQueryName = localParams.get(JSON_QUERY_PARAM);
if (jsonQueryName == null) {
return new MatchNoDocsQuery("No " + JSON_QUERY_PARAM + " parameter specified");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds more like user error to me

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. now it's more strict

private Analyzer resolveAnalyzer(Map<String, Object> params, String field, String ruleName) {
String analyzerName = getOptionalString(params, "analyzer", ruleName);
if (analyzerName == null) {
return req.getSchema().getFieldTypeNoEx(field).getQueryAnalyzer();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call getFieldTypeNoEx -- wouldn't we want to throw if not found?

String analyzerName = getOptionalString(params, "analyzer", ruleName);
FieldType fieldType =
analyzerName == null
? req.getSchema().getFieldTypeNoEx(field)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same -- why getFieldTypeNoEx

return source;
}

private IntervalsSource parseWildcardRule(Map<String, Object> params, String topField) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and everywhere, "topField" is passed. Not sure what "top" means. Anyway, it would be more typed to use SchemaField. From the SchemaField, you can resolve the field type easily. Strings are essentially typeless and can lead to confusion/bugs of wrong parameters. And use of SchemaField could avoid repeated type lookups.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topFiled - means the field we build itervals against, until we explicit override it. Exactly the same we have with spans "{!xmlparser df=v_ws}" ok. it's better for name it df. Thanks for typing suggestion. I try to apply it.

mkhludnev added 2 commits July 3, 2026 00:27
extract JSON_QUERIES_KEY constant and update documentation for new syntax
use typed SchemaField
"intervals qparser should support all_of with nested any_of via df local param",
req(
"q",
"{!intervals df=title_t}$second_query",

@ercsonusharma ercsonusharma Jul 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the proposed query request structure as combination of Standard query and JSON query syntax. It could even work if the q parameter were moved into the json request as the query field, either using the local-params string format or the expanded form shown in this example from documentation. Since this is intended for use with the JSON syntax, it would be more consistent and convenient this way IMHO.

We could also use json query string for readability purpose across the tests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you like this style of requests across all tests?
c1b4635#diff-ec02755a65648a1393267be481208643fd7d1c60834284719565033b32ce7842R185

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:search documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants